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Abstract 

For about a decade model-based reasoning has been propounded by a 
number of researchers. Maybe one of most convincing arguments in favor 
of this kind of reasoning has been given by Davis in his paper on diagno- 
sis from first principles (Davis 1984). Following their guidelines we have 
developed a system to verify the behavior of a satellite-based instrument 
GOME (which will be measuring Ozone concentrations in the near future 
(1995)). We start by giving a description of model-based monitoring. Be- 
sides recognizing that something is wrong, we also like to find the cause 
for misbehaving automatically. Therefore, we show how the monitoring 
technique can be extended to model-based diagnosis. 


1 Introduction 

1.1 Testing complex systems 

Before space systems, like satellite-based instruments, go into orbit, it is impor- 
tant to validate the system’s functioning thoroughly. However, as systems be- 
come more and more complex, the effort needed to verify these systems becomes 
enormous. Traditional testing methods validate system behavior by applying 
test inputs and comparing observed to expected output behavior. Care must be 
taken that all possible interactions between subsystems are covered. Unfortu- 
nately, experience shows that it is nearly impossible to do complete testing, and 
most systems possess some unknown -and unwanted- behavior. In these cases it 
is very important to know if the system (e.#., when it is in orbit) behaves cor- 
rectly. For example, a faulted component of an Ozone measuring instrument may 
influence the measurements negatively. So, it is important to recognize malfunc- 
tioning as soon as possible. However, for a human controller it is just impossible 
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to monitor system’s performance in every detail. An automatic system is needed 
to keep track of the system. 

In this paper we describe a technique to validate systems: model-based mon- 
itoring and diagnosis. Note that there is no intent to replace existing test tech- 
niques; it is an additional method that is used to detect the errors that remain 
after traditional testing and when the system is in operation. The test method 
here presented is model-based. That is to say, a behavior description is used to 
predict how the system should behave, and the predictions are compared to the 
actual observations. If an inconsistency arises, then it is assumed that something 
is wrong and an error is signaled. 

1.2 Gome 

We have applied the test method to verify the GOME instrument (ESA 1993). 
GOME, short for Global Ozone Monitoring Experiment, is an instrument that 
will be mounted on ESA’s ERS-2 satellite. Its purpose is to measure Ozone 
concentrations in the earth’s atmosphere. This is done by comparing the sun’s 
spectrum measured directly to the spectrum of sun light that has been reflected 
and travelled twice through the earth’s atmosphere. 

Apart from a diode array for measuring the spectra, the instrument has a 
number of supporting subsystems. Such as a command interpreter for interpreting 
and executing of commands send by ground control; a data acquisition unit for 
sending the measured spectrum and house keeping data to ground control; a 
mirror unit for scanning the earth’s atmosphere; a heating unit for temperature 
control; etc.. All in all, GOME is a rather complicated system and its behavior is 
hard to verify. 

1.3 Overview of the paper 

In Section 2, we start by describing a monitoring system that is used to verify 
Gome’s behavior. A monitoring system checks if a system is functioning cor- 
rectly, however, the cause of a malfunctioning is not reported. This is part of the 
functionality of a diagnostic system. In Section 3, we extend the description to 
a diagnostic system that is currently being implemented for Gome. Finally, in 
Section 4, some conclusions are drawn and future work is described. 


2 Monitoring 

As already described we have implemented a model-based form of monitoring. 
We assume that something is wrong whenever the model’s predictions are con- 
tradicting the observations of the system’s behavior. That is, a description of 
normative behavior is used to verify the system. 
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In this section we will (1) formalize the method of model-based monitoring in 
a way so that it is easily extended to model-based diagnosis, see Section 3; (2) 
describe the implementation of it; and (3) discuss some of the results of applying 
a monitor program to the Gome instrument. 

2.1 Characterizing model- based monitoring 

In this section we will establish a conceptual framework for defining model-based 
monitoring. Central in this framework, and also in that for diagnosis, is that we 
view a description of system behavior (a model) as a formal system. That is, 
the behavior description is a set of sentences taken from some kind of language 
with a logic attached. It is important to note that we do not restrict ourselves to 
predicate or first-order logics. To the contrary, we view formal systems in which 
algebraic or differential equations can be expressed as important candidates for 
logics in which the behavior of a system can be expressed. Viewing the behavior 
description as a formal system eases the definition and implementation of mon- 
itoring and diagnostic system, but may also introduce notations that may seem 
awkward in the context of system theory. For example, a numerical integration 
step is -in the logical context- considered as a derivation rule, e.g. Euler’s can 
be stated as: 

x{t) = c u with a e r b , 

x'(t) = Ax(t) 

x(t + 1) = AC\ C\i 

with x(t) E R" and A € R n x R n . The derivation of cr from a set E is denoted 
as: 

Eh o-. 

Consider for example the case of dynamical simulation. Let SIMMOD denote 
a dynamical simulation model and INIT its initial conditions both expressed in 
some formal system with Euler’s integration step as derivation rule. Then the set 

PRED = {p : SIMMOD U INIT b p} 

contains all the predictions that can be obtained by applying the derivation rules 
of the formal system. 

Using the logical terminology, we define a system to be monitored as follows: 

Definition 2.1 A system to be monitored is a triple (OBS, MODULES, SD m ), 
where 

• OBS, the observations, is a finite set of observations each of the form 

v = (value), 

where v is a variable, and (value) a value of appropriate type. 
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• MODULES, the modules, is a finite set of so-called modules. Modules are 
introduces to denote subsystems that are supposed to be functioning inde- 
pendently. For each module a separate set of behavior relations is defined, 
as will be explained next. 

• SD m , the system description (for monitoring), is a finite consistent set of 
behavior relations for each module. The general form of a modular behavior 
specification is as follows: 

M D (behavior relations for M), 

where M € MODULES and D denotes material implication (if ... , then 

...). 

Due to a possible incomplete knowledge of the system, e.g. the current state 
is not known, we allow alternative behavioral relations per module; however, 
exactly one of these behavioral relations must be true. That is, each instance 
of “(behavior relations for M)” is of the form: 


reli 0 • • ■ © rel n > 

where reli is e.g. an algebraic or differential equation, and <T\ 0 02 denotes 
the fact that either a \ or a 2 is true, but not both 1 . 0 is also called a choice 

operator. 

When monitoring a system, an error message must be generated whenever the 
predications made by the system description are contradicting the observations. 
A contradiction occurs whenever a prediction assigns a value to a variable that is 
incompatible to the observations 2 . Deciding whether two values are incompatible 
is problem and type dependent. For example, for real-valued variables normally 
a range on the values is defined; for variables with a discrete domain the values 
have to match exactly. Furthermore, because the different modules are assumed 
to be working independently, we can give an indication where something is going 
wrong by stating the module responsible for generating the contradiction. This 
leads to the following. 

Definition 2.2 Let (OBS, MODULES, SD m ) be a system to be monitored. An 
error message for a module M € MODULES is generated whenever 

SD m U OBSU{M} 


is inconsistent? . 

l (Ti 0 (T2 is an abbreviation for <T\ V <73, and -*<r\ V 

2 Because SD m is assumed to be consistent, contradictions may only occur due to a mismatch 
between prediction and observations. 

3 Note that presence of M in the formula enables the use of its behavior relations. 
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It is important to note that we assume that only one module -and no combi- 
nation of modules- is responsible for a contradiction. In other words, multiple 
faults (de Kleer and Williams 1987) are not captured by this definition (this will 
be handled in the section on diagnosis, see Section 3). Note, however, that during 
a monitoring session more than one module may generate an error message. If 
we recall the initial purpose of the monitoring system, viz. the verification if a 
system functions correctly, the restriction to single faults is not that serious. We 
assume that the modules are chosen so that one module captures the behavior of 
a subset of the system constituents. On the occurrence of an inconsistency, we 
know that the culprit is to be found within that subset. 

2.2 Implementation 

We have implemented a monitoring system to verify Gome’s behavior. In Fig- 
ure 1 the overall layout of the program is given. To simplify the implementation, 



the observations (OBS) of Gome’s behavior are first stored on (Bernoulli) disks 
before Gome’s operation is analyzed. A snapshot (the values of all Gome’s 
variables) is taken each 1.5 secs, and is stored in what is called archive data. 
The contents of a single snapshot is called a packet. Packet numbers are used to 
address packets. 

The expected behavior comprises the system description per module 4 . SD m 
can be considered as a kind of simulation model of the system where the be- 
havior relations are centered around the modules. Note that SD m is not truly 
a simulation model because the choice operator introduces alternative behaviors 
per module. So, no conclusive predictions can be made using SD m ; it can only 
be used to do a consistency check. 

4 In the current implementation the program and the system description is coded in 
C (Kernighan and Ritchie 1978). The behavioral relations are decoded as procedures; a more 
elegant -at least viewed from a logical and a maintenance perspective- implementation would 
use a declarative description of both the behavior and the derivation ( e.g . Euler’s rule) relations. 
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The monitor program reads each packet from the archive data (OBS), ‘simu- 
lates’ each module M using the expected behavior ( SD m ) and checks if 

SD m U OBS U {M} 

is inconsistent. If so, an error message with some additional information is 
printed. 

The following example gives some feeling for the implementation of the mon- 
itoring program. 

Example 2.1: Consider the operation of the setting of the mirror’s mode. 

Informally, SD m contains relations that describe the following behavior: 

If a command that sets the mirror in swath mode is in the current 
packet, then after N packets 5 (= N x 1.5 secs.) the mirror position is 
changing according to a linear relation defined on the packet number. 

Now, if a mirror-setting command is found in the current packet, the monitor 
program checks after a delay of N packets the mirror position. □ 

As an extra, the monitor program prints for each packet -what we call- a 
behavior summary with the most important status information of Gome’s opera- 
tion. For example, the behavior summary contains the last submitted command, 
the mirror’s and coolers’ mode, and so on. This extra information comes in handy 
when the cause for malfunctioning is searched (either manually or automatically 
with a diagnostic system). 

2.3 Results 

The monitoring system as described above has been applied to the GOME in- 
strument. It should be clear that a monitoring instrument does not perform a 
full functional test. Types of behavior that are not enabled during the verifi- 
cation process will not be tested for correct functioning. As we have already 
mentioned, it is an additional method of testing. Although GOME was tested 
fairly intensively, the monitoring program did expose a number of faults. To give 
some feeling for the type of faults, we name a few: (1) The integration time (for 
measuring sun light) was set incorrectly on a number of occasions; (2) synchro- 
nization faults of timers on receipt of a command; (3) a too slow operating timer; 
(4) inaccurate scan mirror positioning during swath mode; (5) documentation 
faults (other process variables are measured than documented); (6) etc.. 


3 Diagnosing 

The monitoring program has been proved to be useful for validating the correct 
functioning of GOME. However, when an error message is generated, the cause 

5 Actually, this number N depends on the current packet number modulo 4. 
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for misbehaving has to be searched manually. It is interesting to have a system 
that not only recognizes that something is wrong, but also is able to find the 
cause of misbehaving. This functionality is part of diagnostic systems. 

3 . 1 Characterization 

Similar to model-based monitoring, the characterization of model-based diagnosis 
uses a logical terminology, see e.g. (Reiter 1987; de Kleer and Williams 1987; 
de Kleer and Williams 1989). 

Definition 3,1 A system to be diagnosed is - again - a triple (OBS, COMP, SD), 
where 


• OBS, a finite set of observations , defined as in the case of monitoring. 

• COMP \ a finite set of components. Components are akin to modules, how- 
ever, behavior is assigned to individual components . In this way , it is pos- 
sible to extract responsible components for a discrepancy in observed and 
expected behavior . 

• SD, the system description (for diagnosis) , similar to the model-based mon- 
itoring case, except that the behavior relations are defined per component. 

In the diagnostic case we assume that a component working in a mode. A 
mode represents a physical ( c ondition ’ (so to speak) of a component. For 
example , we have: 

— A normal mode, i.e. the component is working as intended. 

— One or more fault modes, i.e. the -faulted- component is working ac- 
cording to a known behavioral relation. 

— An abnormal mode, i.e. the component is not working as intended but 
we have not anticipated its fault behavior as in the previous case. 

Now, the general form of a behavior relation is: 

Mode(c) D ( governing eq.), 

where c E COMP, and u (governing eq.)” describes how the component’s 
variables are governed when c is working in mode Mode(c). If the Mode(c) 
is the abnormal mode, then the equation is such that no predictions can be 
made. 

To each mode of a component a prior probability is assigned. This prior 
probability is used during the computation of diagnoses as will be explained 
shortly. 
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SD can be considered as a component-centered simulation model. That is, 
behavior relations are given per component, so components responsible for a 
discrepancy can be isolated. In Gome’s case we have the following. 

Example 3.1: We consider two components: c m and c c representing the 

mirror unit and command interpreter, respectively. A normal functioning mirror 
unit (c m ) can scan the earth’s atmosphere either at a fixed position or rotating: 

• At a fixed position, indicated by the predicate fixed(c m ,t ) being true for 
all time instances t the mirror is fixed. The position of the mirror at time 
instance t has a constant value: pos(t) = m* 6 . 

• With a scan angle, indicated by the predicate swath(c m ,t ) being true for 
all time instances t the mirror is rotating. The position of the mirror at 
time instance t has a value that is linearly dependent on t described by the 
function /(<) 7 . 

In Figure 2 the behavior of the mirror is given. 



' fixed (c m ,t) 

D pos(t) = m* 


Normal(c m ) D 

V 

swath(c m , t) 

3 pos(t)=f{t) 

• 


- * 

- 



Figure 2: Mirror unit behavior 


The command interpreter c c sets, among other things, the predicates fixed( C m > 0 
and swath(c m , t ) if a corresponding command has been received 8 , see Figure 3. 

□ 

Now a diagnosis is an assignment of modes such that no predictions can be 
made that are contradictory to the observations. We use the following definitions. 

Definition 3*2 A mode assignment is a conjunction of mode predicates for all 
c £ COMP: 

f\ Mode c (c). 

ceCOMP 

6 This is simplified, actually the position can be controlled. 

7 Again this is simplified; it is possible to control the maximum angle of rotation. 

8 It is assumed that once the predicate fixed(c m , /) or swath(c m , t) is believed, it stays true 
until is explicitly asserted false. 
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Figure 3: Command interpreter behavior 
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Definition 3.3 A diagnosis for a tripel ( OBS , COMP , 57?) is a mode assignment 
T such that: 

SD U 055 u{r} 

is consistent. 

Recall that assuming a mode assignment (r) results in a set of of governing 
equations describing expected behavior. If this set of equations predicts a value 
that is inconsistent with the observations, the assumption represented by T must 
be wrong. That is, the mode assignment T is not a diagnosis. 

In general there are multiple diagnoses and computing all diagnoses can be 
very time consuming. However most of the times we are only interested in the 
most probable (de Kleer and Williams 1989). Using the prior probabilities of 
the modes we first test the most likely mode assignments for consistency. If the 
consistency test succeeds, the posterior probability can be computed by incorpo- 
rating the number of observations that are explained by the mode assignment 9 
as is described in (de Kleer and Williams 1989). 

If a highly probable diagnosis T contains one or more fault (or abnormal) 
modes, it is likely that the corresponding components are the culprit. 

Example 3.2: Consider the example of the mirror unit and the command 

interpreter again. Assume that we observe that the mirror is not moving af- 
ter a swath command has been given. Using only these observations, we can 
only assume that either (or both) the mirror unit or the command interpreter is 
malfunctioning. However, the command interpreter controls other components 

9 Note that the mode assignment which assigns the abnormal mode to all components yields 
always a consistent theory, but does not explain any observation. 
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as well. So, if we see that, e.g ., the integration time is set correctly, then it is 
less likely that the command interpreter is malfunctioning and only the diagnosis 
stating that the mirror unit is the culprit remains. □ 

Implementation The implementation of the diagnostic system that is cur- 
rently being under development is based on the work of de Kleer, Williams and 
Forbus, see (de Kleer and Forbus 1993). We make use of a best-first search 
algorithm in order to compute the most probable diagnoses, see (de Kleer and 
Williams 1989) for a detailed description. 

3.2 Multiple models 

The problem with contemporary diagnostic systems is twofold: (1) It is hard to 
construct a behavior model (SD); and (2) the computation of diagnoses is very 
hard. 

Concerning the construction of a behavior model, one has to realize that in 
order to obtain non trivial diagnoses more than one aspect of system behavior 
must be described. For example, as Davis (Davis 1984) points out, for the de- 
tection of a solder-bridge between two pins of an IC, not only a electrical but 
also a geometrical model is needed. That is, one needs different views on system 
behavior. In case of space systems a lot of aspects, like electrical, mechanical, 
thermal, etc., play an essential role in the behavior of a system. 

Concerning the computational hardness. In general, the computation of a set 
of most probable diagnoses is exponential in the number of components/relations 
in the behavior description. This means that there is no guarantee that a set of 
most probable diagnoses can be computed in acceptable time. 

As solution for both problems approximations of behavior descriptions are 
propounded, see e.g. (Struss 1992; Bos 1994; Nayak 1994). There are two special 
types of approximations: weak and strong abstractions. 

We start with weak abstractions. 

Definition 3.4 A system description SD\ is weaker than SDo (the more accurate 
description), if everything that can be derived from SD\ can also be derived from 
SD 0 . 

Weak abstractions can be used to construct views, i.e., models describing a single 
(or restricted set of) aspect of behavior. Other examples of weak abstractions 
include qualitative reasoning schemes (de Kleer and Brown 1984; Forbus 1984) 
for continuous systems, and temporal abstractions (Hamscher 1991) for digital 
systems. Weak abstractions can be used to speed-up reasoning using the following 
property: 
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Property 3.1 If a combination of mode assignments (a conflict set) yields an 
inconsistent set (see Definition 3.3) for the weaker description, then that combi- 
nation will also yield an inconsistent set for the more accurate description (Bos 
1994). 

In general, reasoning over an abstraction is less costly than over the more accurate 
description. So, we may start reasoning over the abstractions to get (relatively) 
fast but coarse 10 diagnoses. If we like to refine the answers, we know that the 
conflict set found sofar need not be considered again. In this way we can prune 
the search space induced by the more accurate description. 

Strong abstractions are defined as 11 : 

Definition 3.5 A system description SD\ is stronger than SDq (the more accu- 
rate description) , if everything that can be derived from SD^ can also be derived 
from SDq ■ 

Strong abstractions can be applied where the original description allows for a 
choice between two of more outcomes. For example, if the original model describes 
that either in this time instance or in the next a certain event occurs, the strong 
abstraction states one of the possibilities. Strong abstractions can also be used 
to speed-up reasoning by using the following. 

Property 3.2 If a combination of mode assignments yields an consistent set, 
i.e. a diagnosis (see Definition 3.3), for the stronger description, then that com- 
bination will also be a diagnosis for the more accurate description (Bos 1994). 

So, if one chooses one of the outcomes by selecting a strong abstraction and no 
contradictions are found, then in the more accurate description contradictions 
will also not be found. 

In (Struss 1992; Nayak 1994; Bos 1994) heterogenous frameworks for multiple 
models are propounded. In these frameworks it is possible to have multiple 
abstractions of a given models and these abstractions can be stated in different 
languages. For example, both a qualitative model (de Kleer and Brown 1984; 
Forbus 1984) and a hierarchical abstraction (Hamscher 1991) can be used as an 
approximation of, say, a differential model. So, a modeler can select the formalism 
best suited for describing (an approximation of) system behavior. The result is a 
partial order on system descriptions, see Figure 4 for an example. In this figure, 
SDi — > SDj denotes the fact that SI), is an (either a weak or strong) abstraction 
of SDj. 

10 Because the abstractions are weaker than the accurate descriptions we may, for example, 
oversee a diagnosis. 

11 It is important to note that stronger is not equivalent to more accurate. 
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4 Conclusions and future work 

Conclusions We have developed a monitoring system using a model-based 
technique. Such a system can be of great help for verifying system behavior. 
We have applied the monitoring system to the GOME instrument and revealed a 
number of discrepancies in expected and observed behavior. However, a monitor- 
ing system does not pinpoint the cause of malfunctioning; therefore, a diagnostic 
system should be used. A diagnostic system can be defined in a way similar to 
monitoring systems. 

Future work We are currently developing a diagnostic system for Gome. The 
system will make use of abstractions in order to speed-up reasoning and to de- 
scribe different aspects of system behavior. 
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